A TREC Along the Spam Track with SpamBayes
نویسنده
چکیده
This paper describes the SpamBayes submissions made to the Spam Track of the 2005 Text Retrieval Conference (TREC). SpamBayes is briefly introduced, but the paper focuses more on how the submissions differ from the standard installation. Unlike in the majority of earlier publications evaluating the effectiveness of SpamBayes, the fundamental ‘unsure’ range is discussed, and the method of removing the range is outlined. Finally, an analysis of the results of the running the four submissions through the Spam Track ‘jig’ with the three private corpora and one public corpus is made.
منابع مشابه
Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter
Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter
متن کاملSNUMedinfo at TREC Web track 2014
This paper describes the participation of the SNUMedinfo team at the TREC Web track 2014. This is the first time we participate in the Web track. Rather than applying more sophisticated retrieval method such as learning to rank models, this year we used only baseline retrieval models with spam filtering and pagerank prior.
متن کاملBUPT at TREC 2006: Spam Track
This report summarizes our participation in the TREC 2006 spam track, in which we consider the use of Bayesian models for the spam filtering task. Firstly, our anti-spam filter, Kidult, is briefly introduced. And then we try to use weighted adjustment of separating hyperplane and selective classifiers ensemble to improve the filtering performance. Finally, we summarize the relevant results from...
متن کاملAn evaluation of Naive Bayes variants in content-based learning for spam filtering
We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two current variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two variants of Naive ...
متن کاملSpam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track
This paper summarizes our participation in the TREC 2005 spam track, in which we consider the use of adaptive statistical data compression models for the spam filtering task. The nature of these models allows them to be employed as Bayesian text classifiers based on character sequences. We experimented with two different compression algorithms under varying model parameters. All four filters th...
متن کامل